Investigating a heterogeneous data integration approach for data warehousing

نویسنده

  • Hao Fan
چکیده

Data warehouses integrate data from remote, heterogeneous, autonomous data sources into a materialised central database. The heterogeneity of these data sources has two aspects, data expressed in different data models, called model heterogeneity, and data expressed within different schemas of the same data model, called schema heterogeneity. AutoMed is an approach to heterogeneous data transformation and integration based on the use of reversible schema transformation sequences, which offers the capability to handle data integration across heterogenous data sources. So far, this approach has been used only for virtual data integration. In this thesis, we investigate the use of this approach for materialised data integration. We investigate how AutoMed metadata can be used to express the schemas present in a data warehouse environment and to represent data warehouse processes such as data transformation, data cleansing, data integration, and data summarization. We discuss how the approach can be used for handling schema evolution in such a materialised data integration scenario. That is, if a data source or data warehouse schema evolves how the integrated metadata and data can also to be evolved so that the previous integration effort can be reused as much as possible. We then describe in detail how the approach can be used for two key data warehousing activities, namely data lineage tracing and incremental view See http://www.doc.ic.ac.uk/automed/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration and dimensional modeling approaches for complex data warehousing

With the broad development of the World Wide Web, various kinds of heterogeneous data (including multimedia data) are now available to decision support tasks. A data warehousing approach is often adopted to prepare data for relevant analysis. Data integration and dimensional modeling indeed allow the creation of appropriate analysis contexts. However, the existing data warehousing tools are wel...

متن کامل

Schema Evolution in Data Warehousing Environments - A Schema Transformation-Based Approach

In heterogeneous data warehousing environments, autonomous data sources are integrated into a materialised integrated database. The schemas of the data sources and the integrated database may be expressed in different modelling languages. It is possible for either the data source schemas or the warehouse schema to evolve. This evolution may include evolution of the schema, or evolution of the m...

متن کامل

Toward Active XML Data Warehousing

Warehousing data is not a trivial task, particularly when dealing with huge amounts of distributed and heterogeneous data. Moreover, traditional decision support systems do not feature intelligent capabilities for integrating such complex data. Therefore, we propose an approach for intelligent decision support based on active XML warehousing. We exploit XML as a pivot language in order to unify...

متن کامل

Complex Data Integration Based on a Multi-agent System

The expansion of the WWW and the growth of data sources lead to the proliferation of heterogeneous data (texts, images, videos, sounds and relational views). We call these data ”complex data”. In order to explore them, we need to carry out their integration into a unified format. Collecting, structuring and storing constitute the different tasks of complex data integration. There exists many ap...

متن کامل

A committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf

Permeability prediction problem has been examined using several methods such as empirical formulas, regression analysis and intelligent systems especially neural networks and fuzzy logic. This study proposes an improved and novel model for predicting permeability from conventional well log data. The methodology is integration of empirical formulas, multiple regression and neuro-fuzzy in a commi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005